152
Applications in Computer Vision
projecting the real-valued (32-bit) variable x onto a set as
Q = {a1, a2, · · · , an} ,
(6.1)
where Q is a discrete set and n is the bit size of the set Q. For example, n is set as 216 when
performing 16-bit quantization. Then, we define the projection of x ∈R onto the set Q as
PR→Q(x) =
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
a1,
x < a1+a2
2
· · ·
ai,
ai−1+ai
2
≤x < ai+ai+1
2
· · ·
an,
an−1+an
2
≤x
.
(6.2)
By projecting 32-bit wights and activations into low bit cases, the computation source
will be reduced to a great deal. For extreme cases, binarizing weights and activations of
neural networks decreases the storage and computation cost by 32× and 64×, respectively.
Considering the binarization process of BNNs, Eqs. 6.34 and 6.79 are relaxed into
PR→B(x) =
−1,
x < 0
+1,
0 ≤x , s.t. B = {−1, +1} ,
(6.3)
where we set a1 =−1 and a2 =+1. Then PR→B(·) is equivalent to the sign function i.e.,
sign(·).
The learning objective of conventional BNNs (XNOR-Net) is defined to minimize the
geometry distance between x and PR→B(x) as
arg min
x,α
∥x −αPR→B(x)∥2
2,
(6.4)
where α is an auxiliary scale factor. In recent works of binarized neural networks (BNNs)
[199, 159], they explicitly solve the objective as
α =
∥x∥1
size(x),
(6.5)
where size(x) denotes the number of elements in x. However, this objective is insufficient to
maintain the information of the real-valued counterpart x. To overcome this shortcoming,
we introduce the kernel refining convolution.
Furthermore, XNOR-Net, which aligns with most BNNs, leads to intrachannel feature
homogenization, thus causing degradation of feature representation capacity. Hence, a new
feature refinement method should be introduced.
6.2.2
Kernel Refining Generative Adversarial Learning (KR-GAL)
Given a conventional CNN model, we denote wi ∈Rni and ai ∈Rmi as its weights and
feature maps in the i-th layer, where ni = Ci · Ci−1 · Ki · Ki and mi = Ci · Wi · Hi. Ci
represents the number of output channels of the i-th layer. (Wi, Hi) are the width and
height of the feature maps and Ki is the kernel size. Then we have the following.
ai = ai−1 ⊗wi,
(6.6)
where ⊗is the convolutional operation. As mentioned above, the BNN model aims to
binarize wi and ai into PR→B(wi) and PR→B(ai). For simplification, in this chapter, we
denote PR→B(wi) and PR→B(ai) as bwi ∈Bmi and bai ∈Bni in this chapter, respectively.